Deep learning is competing random forest in computational docking

نویسندگان

  • Mohamed A. Khamis
  • Walid E. Gomaa
  • Basem Galal
چکیده

Computational docking is the core process of computer-aided drug design; it aims at predicting the best orientation and conformation of a small molecule (drug ligand) when bound to a target large receptor molecule (protein) in order to form a stable complex molecule. The docking quality is typically measured by a scoring function: a mathematical predictive model that produces a score representing the binding free energy and hence the stability of the resulting complex molecule. We analyze the performance of both learning techniques on the scoring power (binding affinity prediction), the ranking power (relative ranking prediction), docking power (identifying the native binding poses among computer-generated decoys), and screening power (classifying true binders versus negative binders) using the PDBbind 2013 database. For the scoring and ranking powers, the proposed learning scoring functions depend on a wide range of features (energy terms, pharmacophore, intermolecular) that entirely characterize the protein-ligand complexes (about 108 features); these features are extracted from several docking software available in the literature. For the docking and screening powers, the proposed learning scoring functions depend on the intermolecular features of the RF-Score (36 features) to utilize a larger number of training complexes (relative to the large number of decoys in the test set). For the scoring power, the DL RF scoring function (arithmetic mean between DL and RF scores) achieves Pearson’s correlation coefficient between the predicted and experimentally measured binding affinities of 0.799 versus 0.758 of the RF scoring function. For the ranking power, the DL scoring function ranks the ligands bound to fixed target protein with accuracy 54% for the high-level ranking (correctly ranking the three ligands bound to the same target protein in a cluster) and with accuracy 78% for the low-level ranking (correctly ranking the best ligand only in the cluster) while the RF scoring function achieves (46% and 62%) respectively. For the docking power, the DL RF scoring function has a success rate when the three best-scored ligand binding poses are considered within 2 Å root-mean-square-deviation from the native pose of 36.0% versus 30.2% of the RF scoring function. For the screening power, the DL scoring function has an average enrichment factor and success rate at the top 1% level of (2.69 and 6.45%) respectively versus (1.61 and 4.84%) respectively of the RF scoring function. keywords: Deep learning; Neural networks; Random forest; Drug discovery; Computational docking; Virtual screening.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Random Survival Forests for Competing Risks and Regression Models in Determining Mortality Risk Factors in Breast Cancer Patients in Mahdieh Center, Hamedan, Iran

Introduction: Breast cancer is one of the most common cancers among women worldwide. Patients with cancer may die due to disease progression or other types of events. These different event types are called competing risks. This study aimed to determine the factors affecting the survival of patients with breast cancer using three different approaches: cause-specific hazards regression, subdistri...

متن کامل

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

Target Predictions using LINCS Data

Identifying the binding targets of small molecules is an essential process in drug discovery and development. The two conventional approaches include high throughput screening (HTS) and computational structural docking. HTS suffers from its expensive cost and time-consuming procedure, while the computational methods reply on simplifying assumptions that often leads to less accurate results. In ...

متن کامل

Automated Mass Detection from Mammograms using Deep Learning and Random Forest

Mass detection from mammogram plays an crucial role as a pre-processing stage for mass segmentation and classification. In this paper, we present a novel approach for detecting masses from mammograms using a cascade of deep learning and random forest classifiers. The deep learning classifier consists of a multi-scale deep belief network classifier that selects regions to be further processed by...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1608.06665  شماره 

صفحات  -

تاریخ انتشار 2016